Data Cleansing: Beyond Integrity Analysis

نویسندگان

  • Jonathan I. Maletic
  • Andrian Marcus
چکیده

The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The applicable methods include: statistical outlier detection, pattern matching, clustering, and data mining techniques. Some brief results supporting the use of such methods are given. The future research directions necessary to address the data cleansing problem are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Practices, Perceptions, and Beliefs of Traditional Birth Attendants Regarding Early Breastfeeding Initiation in Zimbabwe: A Qualitative Study

Background & aim: Early breastfeeding initiation (EBFI) defined as giving breast milk within the first hours following birth, which is recommended as a simple strategy for the enhancement of neonatal health and survival. This descriptive qualitative study was conducted to explore the practices, perceptions and beliefs of renowned traditional birth attendants (TBA) regarding EBFI in Chipinge rur...

متن کامل

Cleansing and preparation of data for statistical analysis: A step necessary in oral health sciences research

In many published articles, there is still no mention of quality control processes, which might be an indication of the insufficient importance the researchers attach to undertaking or reporting such processes. However, quality control of data is one of the most important steps in research projects. Lack of sufficient attention to quality control of data might have a detrimental effect on the r...

متن کامل

Reducing the Risk of Insider Misuse by Revising Identity Management and User Account Data

To avoid insider computer misuse, identity and authorization data referring to the legitimate users of the systems must be properly organized and constantly and systematically analyzed and evaluated. In order to support this, a methodology for structured Identity Management has been developed. This methodology includes gathering of identity data spread among different applications, systematic c...

متن کامل

Analysis of Data Cleansing Approaches regarding Dirty Data - A Comparative Study

Data Cleansing is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. It deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. The research was directed at investigating some existing approaches and frameworks to data cleansing. That attempted to solve the da...

متن کامل

Data Cleansing - A Prelude to Knowledge Discovery

This chapter analyzes the problem of data cleansing and the identification of potential errors in data sets. The differing views of data cleansing are surveyed and reviewed and a brief overview of existing data cleansing tools is given. A general framework of the data cleansing process is presented as well as a set of general methods that can be used to address the problem. The applicable metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000